set.seed(123)
mydata <- tibble(
normal = rnorm(
n = 200,
mean = 50,
sd = 5
),
non_normal = runif(
n = 200,
min = 45,
max = 55
)
)Day 3 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
October 2, 2023
There are various tests and the outcome might differ!
Shapiro-Wilk-Test
Visual tests: QQ-Plot
Create a tibble with two variables
normal: 200 normally distributed values with mean 50 and standard deviation 5non_normal: 200 uniformly distributed values between 45 and 55\(H_0\): Data does not differ from a normal distribution
Shapiro-Wilk normality test
data: mydata$normal
W = 0.99076, p-value = 0.2298
The data does not deviate significantly from a normal distribution (Shapiro-Wilk-Test, W = 0.991, p = 0.23).
Points should match the straight line. Small deviations are okay.
Counts of insects in agricultural units treated with different insecticides.
First, test for normal distribution!
F-Test
Levene test
If we want to compare variances between treatments A, B and C, we first test for normal distribution
Shapiro-Wilk normality test
data: TreatA
W = 0.95757, p-value = 0.7487
Shapiro-Wilk normality test
data: TreatB
W = 0.95031, p-value = 0.6415
Shapiro-Wilk normality test
data: TreatC
W = 0.92128, p-value = 0.2967
Result: All 3 treatments are normally distributed.
\(H_0\): Variances do not differ between groups
F test to compare two variances
data: TreatA and TreatB
F = 1.2209, num df = 11, denom df = 11, p-value = 0.7464
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3514784 4.2411442
sample estimates:
ratio of variances
1.22093
Result: The variances of sprays A and B do not differ significantly (F-Test, \(F_{11,11}\) = 1.22, p = 0.75)
\(H_0\): Variances do not differ between groups
F test to compare two variances
data: TreatA and TreatC
F = 7.4242, num df = 11, denom df = 11, p-value = 0.002435
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
2.137273 25.789584
sample estimates:
ratio of variances
7.424242
Result: The variances of sprays A and C differ significantly (F-Test, \(F_{11,11}\) = 7.42, p = 0.002)
t-test
Welch-Test
Wilcoxon rank sum test
\(H_0\): The samples do not differ in their mean
Treatment A and B: normally distributed and equal variance
Two Sample t-test
data: TreatA and TreatB
t = -0.45352, df = 22, p-value = 0.6546
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.643994 2.977327
sample estimates:
mean of x mean of y
14.50000 15.33333
Result: The means of spray A and B don’t differ significantly (t = -0.45, df = 22, p = 0.66)
\(H_0\): The samples do not differ in their mean
Treatment A and C: normally distributed and non-equal variance
Result: The means of spray A and C do differ significantly (t = 7.58, df = 13.9, p < 0.001)
\(H_0\): The samples do not differ in their mean
We don’t need the Wilcoxon test to compare treatment A and B, but for the sake of an example:
Result: The means of spray A and B do not differ significantly (W = 62, p = 0.58)
Are there pairs of data points?
Example: samples of invertebrates across various rivers before and after sewage plants.
Use paired = TRUE in the test.
Careful: your treatment vector both have to have the same order
ggsignifThe ggsignif package offers a geom_signif() layer that can be added to a ggplot to annotate significance levels
geom_signif()geom_signif()test: run specific testtest.args: pass additional arguments in a list?geom_signif for more optionsstat_summaryAnother way to plot the results is to plot mean and standard error of the mean:
stat_summaryAnother way to plot the results is to plot mean and standard error of the mean:
stat_summary, define summary function
fun.data for errorbarsfun.y for point values (e.g. mean)stat_summaryAnother way to plot the results is to plot mean and standard error of the mean:
stat_summaryJust like before, you can also add a geom_signif to a barplot:
ggplot(
InsectSprays,
aes(x = spray, y = count)
) +
stat_summary(
fun.data = mean_se,
geom = "errorbar",
width = 0.3
) +
stat_summary(
fun.y = mean,
geom = "bar"
) +
ggsignif::geom_signif(
comparisons = list(
c("A", "B"),
c("B", "C"),
c("A", "C")
),
test = "t.test",
map_signif_level = TRUE,
y_position = c(17, 18, 19)
)Task 1 (45) min)
Statistical tests
Find the task description here
Selina Baldauf // Statistical tests